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ABSTRACT 


r:, 


In picture processing an Important problem is to identify two digital pictures of the same scene taken 
under different lighting conditions. This kind of problem can be found In remote sensing, satellite signal 
processing and the related areas. The identification can be done by transforming the gray levels so that the 
gray level histograms of the two pictures are closely matched. The transformation problem can be solved by 
using the ^packing* method. Iru_thii^piB£j^^,J«e . propose a VLSI architecture consisting of m x n Pressing 
elements with extensive parallel and pipelining computation capabilities to speed up the transformation with the 
time complexity 0(max(m,n)), where m and n are the numbers of the gray levels of the Input picture and the 
reference picture respectively. If using uniprocessor and a dynamic programming algorithm, the time complexity 
will be O(nPin). The algorithm partition problem, as an Important Issue in VLSI design, is discussed. 
Verification of the proposed architecture is also given. 

Index terms^ 'Digital picture comparison, packing algorithm, very large scale integration (VLSI), algorithm 
* partition, VLSI architecture verification. 


I. INTRODUCTION 

The technique of dynamic programming has wide applications in computer science [6,7] for solving 
mathematical problems arising from multistage decision processes. Based on the dynamic programming path-finding 
algorithm, the technique of dynamic programming Is both mathematically sound and computationally efficient. The 
recent advent of very-l-arge-scale Integration (VLSI) technology has triggered the thought of implementing some 
algorithms directly in hardware with extensive parallel and pipelining computation capabilities. The use of 
VLSI architectures to implement dynamic programming procedures has been investigated for several applications. 
Guibas et al . [8] describes a VLSI architecture for a class of dynamic programming problems characterized by 
optimal parentheslzation. Chu and Fu [9] describe VLSI architectures for recognition of context-free ana 
finite-state languages. Chiang and Fu [10] describe a VLSI implementation of Early's algorithm for parsing 
general context-free languages. Cheng and Fu [11] describe algorithm partition and parallel recognition of 
qeneral context-free languages using fixed-size VLSI architecture. Liu and Fu [12] describe a VLSI implementa- 
tion for string-distance computation. Clarke and Dyer [15] describe four VLSI architectures for line 
and curve detection. Cheng and Fu [13,14] propose VLSI architectures for pattern-matching and hand-written 
symbol recognition. In this paper, we propose a VLSI architecture for identifying digital pictures If they are 
taken from the same scene under different lighting conditions. This is a very important problem related to 
remote sensing, satellite signal processing and other areas. As an Important issue In VLSI design, the 
algorithm partition problem is discussed. The backtracking procedure Is also discussed in much detail, and the 
formal verification of the proposed architecture is given. An example is used to illustrate the work of the 
proposed VLSI architecture. 


II. PRELIMINARY 

The image matching technique has been used extensively for many applications such as curvature sequences 
detection [2], template matching and pattern matching [1], character recognition, target recognition, aerial 
navigationand stereo mapping, picture matching, earth resource analysis, missile guidance, intelligence 
gathering systems, and robotics [2,3]. 

There are many situations in which we want to match or register two pictures with one another, or match 
some given pattern with a picture [2]. For example: 

(a) Given two or more pictures of the same scene taken by different sensors, we want to determine the 
characteristics of each pixel with respect to all of the sensors and then we can classify the pixels. 

(b) Given two pictures of scenes taken from different times, we want to determine the polios at which they 
differ and then can analyze the changes that have taken place. 

(c) Given two pictures of a scene taken from different positions, we want to identify corresponding points 
in the pictures and then determine their distances from the camera to obtain three-dimensional 
information from the scene. 
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(d) We want to find places In a picture where It matches a given pattern. 

In this paper we want to discuss another very Important aspect In picture processing which Is to identify two 
digital pictures of the scene taken under different lighting conditions. These kinds of problems arise from 
many areas such as remote sensing, satellite signal processing, and etc. The Identification can be done by 
transforming the gray levels so that the gray level histograms of the two pictures are closely matched. 

Mathematically, a picture Is defined by a function of two variables F(x,y), where F(x,y) is the brightness, 
or K-tuples of brightness values In several spectral bands, [2,4,5] and x and Y are the coordinates In the Image 
plane. In the black and white case, the values are called gray levels. These values are real, non-negative, 
and bounded. The pictures are represented as matrices with Integer elements which are the pixels. A gray 
level histogram of an Image Is a function that gives the frequency of occurrence of each gray level In the 
Image. Where the gray levels are quantized from 0 to n, the value of the histogram at a particular gray level 
p, denoted H(p), Is the number of fraction of pixels In the Image with that gray level [5]. When pictures of 
the same scene are obtained under different lighting conditions, different histograms are gained. For 
Identifying these pictures, we can transform their gray level scales so that their histograms would closely 
match each other. 


Assume that Hi and are histograms of two pictures obtained from the same scene with m and n gray levels, 
respectively. An algorithm Is proposed to "reshape'' Hi (l.e. rescale Its gray levels) so that It has the mini- 
mal deviation from H 2 . The mathematical problem is defined by: 


Z - 


(X X I 1 



where P * Xj„i and Q*Xj-l subject to 


l*X 0 <Xi<...<X n *m+l 


( 1 ) 


X j - Integer, for 1 a l,...,n. 

It will transform the gray levels Xj_i,...,Xj-I In one picture into gray level j in the other picture, for suit- 
ably chosen Xj_i and X j , j * l,...,n. 

This problem can be interpreted as a packing problem: to pack m objects of sizes {Hi(l),...,H^(m)) Into n boxes 

of spaces (H 2 (l) .....HgCn) } in such a way that 

(i) if the 1 th object has been placed in the J th box, the {i+l) th object is not allowed to be packed into 
the k 1 ^ box for any K < j, and 

(ii) the accumulated error due to space over-packed of leftover Is minimized. 


Such a problem can be solved by using dynamic programming techniques. Let S j ( i ) be the minimal accumulated 
error caused by transforming the gray levels 1,...,1 into the gray levels l,...,j. The recursive formula is 


given by 


s j( i) 


Min 

0<u<i 


1 

{S • , (u)+ |Hp( j )- l H ( v) J } 
c v=u+l 1 


(2J 


for i=l, ...,m and j=l,...,n 


1 J 

where the initial conditions are S o (0) * U, S 0 (1) * £ Hj ( v ) for all 1 = and Sj{0) = H 2 (u) for all 

V“ 1 u*l 

j 

j = l,...,n. If i>j, then £ Hj(k) = 0. The minimal accumulated error, S n (m), can.be computed. 

k=i 

The straight forward execution of this procedure would obtain the optimal solutions for all (i,j) pairs 
with time complexity 0(m3xn) by using uniprocessor. In this paper, we want to propose a m x n VLSI array f o 
speed up the computation. The time complexity for the proposed architecture is 0(max(m ,n) ) . 


III. VLSI DIGITAL PICTURE COMPARATOR 
3.1 The algorithm and its VLSI implementation 

We will propose a VLSI architecture based on the space-time domain expansion approach [14,15], which has a 
very natural and regular configuration and can be implemented easily by applying today's VLSI technology. 
Another important issue in VLSI design - algorithm partition problem is also solved by using the proposed VLSI 
architecture. The proposed VLSI architecture can speed up the digital picture comparison procedure greatly 
by using extensive parallel and pipelining techniques. Before discussing the VLSI architecture in detail, we 
propose the following algorithm. 

Let Hi and H 2 be the histograms of two pictures taken at the same scene with m and n gray levels, respect- 
ively. 
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Algorithm 1: The algorithm for digital picture comparison 

begin 

$ o (0) 0; 
for 1 :■ 1 to m do 
begin 

S 0 (1) 0; 

for k :* 1 to 1 do 
S 0 (1) :» $ 0 d)+Hi(k) 
end; 

for j :■ 1 to n do 
begin 

Sj(O) ;« 0; - 

for k :* 1 to j do 
Sj(0) :« Sj(0)+H 2 (k) 
end; 

for 1 :■ 1 to m do 
for u :* 0 to 1-1 do 
begin 

v :* u + 1; 

T(v) := 0; 
for k := v to 1 do 
T(v) :* Hx(k) + T{v) 
end; 

for 1 :* 1 to m do 
for j :» 1 to n do 
begin 

T* := Sj.j(1) + H 2 ( j ) ; 

I i (index channel value); 

T 0; 

for u :* 1 to i-1 do 
begin 

v u +1; 

s 0; 

for k :=* 1 to v do 


begin 

s := s + ( k ) ; 

T |H 2 (j) - s | ; 
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T :• sj_i(u) + T; 

If T* < T then 

t* :» T and output v to the Index channel by letting I :* v 
end 

end; 

end 


Append Index-pair (I.J-l) to Index-pair (1,j), when the Identification signal arrives, and form 
{(I.J-l), (1,j)>. 

end. 

We can build a VLSI array with m x n processing elements. Each processing element has a subtracter which 
will produce the absolute value of the two Inputs difference, a comparator which will compare two input values 
and output the smaller value with the corresponding Index to the next processing element below it. The 
functions performed by the (i.j)^ processing element are as follows: 

1 

Input: H 2 (J), outputs of (1-l,j) th processing element, J H,{v), Index-pair, S, ,(1-1) and S . ( 1 ) . 

Output: Sj(1) and Index-pair to the right element when the Identlf Icatlon signal arrives, and the Intermediate 

results to the processing element below. 

Operations: Each processing element has a local connection to the processing element beneath It which will 

accept the Intermediate results Including the accumulated errors and the index-pairs, and has a local connection 
to the right processing element which will receive Sj(1) and Index pair (i,j) when the identification signal 
arrives. Each processing element can perform accumulation, |a-b|, and comparison operations, and requires one 
time unit. The adder uses the combinational circuit, which will not require the time unit, or Its delay Is much 
smaller than a time unit. The data will move among the processing elements, one processing element per time 
unit. 

1) Input data of arrives at the (1,j) th PE and performs the accumulation of each for one time unit, 
i 

|H 2 (j)-[H 1 (v) j needs one time unit. 


2) Sj_i ( 1 -1 ) arrives at the (1,j) th processing element and it performs Sj_j(i-1) + |H?( j )-Hi ( 1 ) j operation 
which requires one time unit. The result Is delayed one time unit. 

i 

3) 0<u<i-2{Sj_i(u)+|H2(j)- l Hj(v)|} arrives at the processing element from the { 1 - 1 , j ) th pro . 

V“U + 1 

cessing element and compares with the result of step 2} which requires one time unit. At the same time, 
the identification signal arrives and the result will compare with Sj_i ( 1 )+H 2 ( j ) which will require one 
time unit. Then S j { 1 ) and the Index-pair will be sent to the (l.j+ljt* processing element. 

Algorithm 2 VLSI implementation of Algorithm 1 

Input: Gray levels of the input picture -Hj(1), and of the reference picture -H?(j) (for l<i<m, l<j<n); 

indices. Index pairs; Initial conditions: S o (0), S 0 (i), and Sj(0) (for l<1<m, l<j<n); and identification 

signals. 

Output: The accumulated error S j ( i ) and corresponding index pairs 

Move the gray levels H 2 < j ) of the reference picture, the identification signals, and the index j from the 
top to the bottom one processing element per time unit. Move the gray levels Hj(1) of the input picture and 
index i from the left to the right of the VLSI array one processing element per time unit. The identification 

signals will be sent at the fifth time unit and will move down one processing element per two time units and 
move to the right one processing element per time unit. When the Identification signal arrives, it will open 
the connection channel to the comparator which connects the right processing element, and S j ( i ) will be sent to 
the processing element (1,j+l). To obtain the 'packing* sequence, we have to perform a backtracking procedure 
which can be done in several ways as follows. 

1) Output the accumulate error matrix S and/or the Index-pairs to the host machine which will perform the 
backtracking procedure. 

2) Attach another VLSI module and use the tag of the index pair as the search key to perform the backtrack- 
ing procedure. 

3; Expand the 'append* operation to the one which appends the index into the index list of its ancestor. 
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An Index list Is formed by appending an Index or an Index list. We can use Index (m,n) as the tag to 
find the 'packing* sequence. This will change the backtracking procedure Into forward and speed up the 
computation, but It requires a large output channel capacity, especially for the processing elements 
located at the upper-right corner. The upper bound of the channel capacity for the (1,J) processing 
element will be (1+j+i). 

4) Add an Index register to each processing element which consists of two parts, the first part for the 
first Index and the second part for the second Index. The second part of the Index pair register will 
compare with the tag. If they are matched, the second part Is output Into the output channel and also 
output Into the first part as the tag to Its top and left side neighbors. The tag will move up until It 
match with another Index pair. The procedure will be continued. We need one Index register for each 
processing element. At the (21*j+3)‘ h time unit, send a backtracking signal which moves along the 
channels connecting to the left neighbor and the one on the top of It, each processing element per time 
unit. The Index (m,n) 1$ used for the tag of the (m,n)t h processing element. It needs at most (m+n) 
time units to complete this procedure. 

From the above discussion, we can conclude that the proposed architecture can compare two digital pictures 
by transforming the gray levels. In many applications, only the summation error Is required. In such cases, we 
can simplify the structure of the processing element and the entire VLSI architecture further. If there are P 
digital pictures which are compared with the reference picture, or an Input digital picture compared with P 
reference digital pictures, we can make a P-tlme expansion. The time complexity will be 0(max(Pxm,n) ) . If 
using uniprocessor, the time complexity will be OfPxm^xn). For Indicating the most matched digital picture, we 

number the digital pictures and add a register consisting of two parts. One part Is for the summation error, 
the other Is for the Index of the numbered digital pictures. We also add a counter which Is Initially set to 
zero and starts at (2m+n+3)rd time unit. 

The operation of the register is as follows: 

begin 

error. register :*«; 

If error. register > error array 
then begin 

error. register terror. array 
Index. reg1ster:*counter 
end end 

The final result of Index. register indicates the Index of the most matched digital picture. If we use a 
three dimensional array (Pxmxn processing elements), the time complexity will be reduced to 0(max(P,m,n)) . 
The detail will be omitted here. 


IV. VERIFICATION OF THE PROPOSED ALGORITHM 
To verify algorithm 2, we need the following lemmas and theorem. 

Lemma 1: The identification signal irrives at the (1,j) th processing element at the (2i+j+2) th time unit. 

Proof: The identification signal is sent at the fifth time unit and it needs 2(1-1) time units to reach 
the i th row, it then needs j time units to arrive at the (1,j)^ processing element. Totally, 5+21-2+j-l* 
21+j+2 time units. 

i 

Lemma 2: V H,(v) will be computed at the (v,j) th processing element at the (i+v+j-2) th time unit, for all 
v 

lad ,l<i<m and l<j<n. 

Proof: First consider the j«I case. From the data arrangement in Fig. 3, the first input of the v th row 

will arrive at the boundary of the array at 2(v-l) th time unit, then (1-v+l) time units are needed to compute 

£Hi(v). Totally, 2( v-l)+( i-v+l)*1+v-i time units are needed. Since the computation of the (v,k)- h processing 
v 

element will start one time unit earlier than one of the (v.k+l)^ processing elements, the time units needed 
for the (v,j) th processing element will be 1+v+j-2 to produce the summation. 

Theorem: After receiving the inputs, S j { i ) will be produced at the (2i+j+3) th time unit, for all Wi <m and 
l<j<n. 

Proof: We prove the theorem by induction on i and j. 

Basis: First we consider i-j»l case. Since S o (0) and S 0 (l) are fixed values which exists already, Hj(I) 

the inputs into the processing element (1,1) and it performs the accumulation which requires one time unit. 
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Then IH9(1 )-Hi{ 1) I is performed by spending another time unit. it will be added to $q( 0) and delayed one time 
unit. At the forth time unit, the result will be compared with Sj(0). When the Identification signal arrives 
at the fifth time unit, the result of the comparator will compare with S 0 (l) ♦ Hgd) and output SjO) which 
needs one more time unit, 6*2xl+l*3"2xi+j+3. 


Induction Step: Our Induction hypothesis is that all (p,q) th processing elements can produce the outputs 
and the index-pairs at the (2xP+q+3) t h time unit, for all 1 <p<1 and l<q<J. ^ ^ 

Now consider the (1+1 ,j) th processing element. According to lemma 2 and the hypothesis, l H|(v) will be 

v 

computed at the (v,j) th processing element, (i+l+v+J-2) th time unit, for all^l<v<1, U1<m and l<j<n and the 

comparators are connected in a pipeline version, so M « min {$j-i(u)+jH2(J )- l Hi<v) | > will be output from the 

o<uo v*u+l 

{1,j-l) th processing element, (21+j-l+2+3) th time unit. Also Sj.i(l-fl) will be input at the 2x(i*l)*j-l+3 th 
time unit. At the same time K-Si.i(1+1)+H2(j) will be computed. According to lemma 1, the identification 
signal arrives at the (1+l,j) th processing element at the 2(1*l)*j+2*" time unit. Then M and N will be 

compared, the minimum {M,N}«Sj(m) will be sent to the (1+1 J) th processing element at the tflM+S)*" time 
unit. Since Sj + i(1+I) will be one time unit later than $i(1+l), Sj+|(1*l) will be obtained at the (2i+j+6)« 
2(Hl)+{j+l)+3 the time unit. Therefore the proof Is completed. 

Corollary 1: The accumulated error and the index pairs can be obtained at the (2m+n+3) th time unit. 


Proof: Follow the theorem and let 1«m and j-n. 


V. ALGORITHM PARTITION 

We could use a one-dimensional array or a two-dimensional array with size different. to the problem size by 
performing time expansions following the partition rule. 

A. Using the One-Dimensional Array 

First we assume that the size of the array is m. We can consider It as an m-space expansion along the X 1 
direction. The Input channels will form the queues. The register will hold the initial value and the result 
from the CR output which will Input into the register by the control signal. The control signal is sent at the 
( m +l)th time unit and moves down per two time units and one processing element. The Input will repeat n times. 
The time complexity will be Q(m x n). 

8. Using the Two-Dimensional Array with the Dimensions Kxl 

If k*m and t»n, it Is the case which has already been discussed. We now consider the other cases. Accord- 
ing to the partition rule we nave to make an [m/k] - time expansion and an [n/1] - time expansion. There are 
also queues for feedback of the data. The lengths of the queues will be varying with the values of m and n to 
make the right data meet at the right processing element at the right time. This will cause much difficulty to 
the control system and the queue structures. Hence, we either use a sufficiently large size VLSI architecture 
or use a one-dimensional array to solve the partition problem. 


VI. CONCLUDING REMARKS 

We have proposed an VLSI architecture for digital picture comparison. The time complexity will be 0(max 
(m,n)) by using a two-dimensional m x n array, where m is the gray level of the input digital picture and n is 
the ray level of the reference digital picture. With a uniprocessor, the comparison process will have the time 
complexity 0(m3 x n) if using the straightforward computation approach. If there are p reference pictures using 
the proposed architecture, the comparison process will be solved in time 0(max(mxp ,n) ) ; and using a uniprocessor 
the time complexity will be 0{mxpxn). If using a three-dimensional array, this problem can be solved in time 
0(max(m,n,p) ) . One important issue, the algorithm partition of the VLSI design is discussed and formal verifi- 
cation of the proposed VLSI architecture is given. The proposed architecture will be useful for remote sensing, 
satellite signal processing, and other related areas. It can also be useful for other packing related tasks 
and for real-time digital picture processing. 
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